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ABSTRACT 

A  parametric  restriction  is  often  interesting  because 
it  makes  pos^iible  simplifications  or  Improvements  in  estimators 
for  the  parameters  of  primary  interest.  In  such  cases,  a 
specification  test  examines  the  effect  of  imposing  the 
restrictions  on  the  estimator,  whereas  classical  tests  exar.ine 
the  restrictions  themselves  in  light  of  the  data.  In  some 
circumstances,  this  leads  to  discrepancies  in  large  sample 
behavior  between  (i)  specification  tests  and  (ii)  likelihood 
ratio,  Wald,  and  Lagrange  multiplier  tests.  We  examine  this 
distinction  in  three  cases  of  recent  interest:  exclusion 
restrictions  in  a  simple  linear  model,  parametric  restrictions 
in  a  general  non-linear  implicit  model,  and  exogeneity  restrictions 
in  a  simultaneous  equations  model. 


1.  Introduction 

A  tenet  of  large  sample  statistical  theory  is  the 
sufficiency  of  the  trinity  of  tests:  that  any  reasonable 
test  of  a  statistical  hypothesis  is  at  least  asymptotically 
equivalent  to  a  likelihood  ratio,  Wald,  or  Lagrange  multiplier 
test.  Recently,  a  class  of  mis-specification  tests  was 
introduced  (Hausman  (1978))  which  makes  use  of  the  difference 
between  parameter  estimates  which  impose  and  do  not  impose 
a  null  hypothesis;  and  some  speculation  has  ensued  concerning 
the  relationships  among  these  tests  and  the  trinity.  Holly 
(1980a, b)  in  particular  has  compared  the  specification  test 
of  (i)  parametric  restrictions  with  nuisance  parameters,  and 
(ii)  exogenelty  in  a  triangular  simultaneous  equations  system 
with  the  conventional  tests  and  has  found  -  in  some 
circumstances  -  significant  differences  in  large  sample 
behavior.  In  this  paper,  we  explain  these  discrepancies  in 
terms  which  shed  some  light  on  the  hypothesis  that  the 
specification  test  is  actually  testing. 

A  parametric  restriction  is  often  interesting  because 
it  enables  us  to  simplify  or  improve  our  estimator  for  the 
parameters  of  primary  interest.  Thus  an  uninteresting  variable 
may  be  included  in  a  linear  regression  or  a  variable  may  be 
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treated  as  endogenous  solely  to  ensure  consistent  estimates 
of  the  parameters  of  Interest.  In  both  cases,  the  restriction 
Involved  may  be  tested,  but  the  cost  of  a  type  I  or  II  error 
depends  upon  the  effect  of  that  error  on  the  point  estimates 
of  the  parameters  that  matter.  Thus  excluding  a  variable 
from  a  linear  regression  is  costly  only  to  the  extent  that 
estimates  of  the  remaining  slope  coefficients  change,  given 
their  sampling  errors.  Similarly,  the  cost  to  treating  a 
variable  as  predetermined  depends  upon  the  difference  that 
restriction  makes  in  the  point  estimates  Involved. 

The  specification  test  is  based  upon  the  difference  in 
the  point  estimates  caused  by  Imposing  the  restrictions.  In 
the  cases  above,  imposing  the  restrictions  gains  a  little 
efficiency  (if  true)  but  sacrifices  consistency  if  false. 
Thus  the  specification  test  detects  departures  from  the 
restrictions  in  an  appropriate  norm,  and  this  is  shown  below 
to  characterize  the  difference  between  specification  tests 
and  conventional  tests.  Indeed,  In  the  cases  considered 
below,  we  are  able  to  show  that  the  specification  test  of  a 
set  of  restrictions  is  identical  to  a  conventional  test 
of  the  specification  hypothesis  that  Imposing  the  restrictions 
does  not  affect  the  point  estimates.  Being  asymptotically 
equivalent  to  a  likelihood  ratio  (etc.)  test,  it  has  the 
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familiar  optimal  local  power  properties,  so  that  In 
circumstances  in  which  the  specification  hypothesis  is 
the  relevant  hypothesis,  the  specification  test  is  probably 
the  test  of  choice. 

In  essence,  these  results  point  out  the  importance 
of  carefully  specifying  the  null  hypothesis  one  wishes  to 
test.  In  the  linear  regression  model  in  section  2,  we  show 
that  the  common  practice  of  omitting  variables  whose 
estimated  coefficients  fall  an  F  test  is  often  based  on  a 
test  whose  nominal  size  is  smaller  than  its  true  size.  In 
such  cases,  the  specification  test  is  uniformly  most  powerful 
among  invariant  tests  of  the  specification  hypothesis  and 
clearly  dominates  the  F  test.  These  and  other  results  are 
extended  in  section  3  to  explain  anomalies  in  the  specification 
test  in  non-linear  models  outlined  in  Holly  (1980a),  These 
principles  are  applied  to  exogeneity  tests  in  simultaneous 
equations  systems  (section  4).  This  generalizes  and 
explains  the  difference  between  the  specification  test  and 
the  Lagrange  multiplier  test  for  recursiveness  in 
triangular  systems  discussed  in  Holly  (1980b). 
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2 .  Specification  tests  In  the  linear  model 

2.1.  The  model  and  the  m  test 

The  basic  idea  discussed  in  the  introduction  can  be 
demonstrated  simply  by  specification  tests  involving  linear 
homogeneous  restrictions  in  the  linear  model.  Suppose 

(2.1)'        Y  =  X^6-j_  +  X2B2  +  e 

where  3,, 6-  are  k  jk-  vectors  of  unknown  parameters  (k  +kp=k) 

and  e  is  a  T  vector  of  Gauss-Markov  disturbances.  The  coefficients 

3,  are  presumed  to  be  of  primary  interest,  and  the  columns  of 
X„  are  Included  in  equation  (2.1)  solely  to  avoid  specification 
error  in  our  estimates  of  6,.  For  a  practical  example,  interpret 
equation  (2.1)  as  a  demand  function  for  the  services  of  a 
regulated  public  utility  in  which  the  scalar  X,  represents  the 
service  price.  In  a  rate  hearing,  point  estimates  of  the  price 
elasticity  are  required  in  order  to  set  prices  in  the  next  period 
to  achieve  an  assigned  rate  of  return.  Effects  of  changes  in  other 
prices  and  income  (the  columns  of  Xp )  are  of  interest  only  insofar 
as  they  influence  our  estimate  of  6-,  • 

Typically,  we  test  the  null  hypothesis 
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against  the  alternative 

'\j 
In  this  case,  though,  there  Is  some  suggestion  that  certain 
functions  of  0p  are  more  important  than  others.  This  point  will 
be  developed  at  length  below. 

An  alternative  interpretation  of  this  problem  can  be 
set  in  the  context  of  limited  information  simultaneous  equations 
estimation  which  we  discuss  in  Section  4.  Consider 

(2.2)        Y  =  X^e^  +  [X^Sg  +  e]  =  X^e^  +  e 

where  plim  ;=rXIe  =  0  (1=1,2).  We  are  concerned  with  estimating 
^,    and  must  determine  if 

plim  ^X'e  =  plim  ix'X  6   =  0 

in  order  that  least  squares  estimates  of  B-,  be  consistent.  Again, 
one  typically  examines  H  :  Bp  =  0  but  recognizes  that  estimable 
functions  of  the  form  A'Bp  =  C'X'X  Sp  are  of  particular  interest. 

Following  Hausman  (1978),  both  interpretations  of  the 
problem  lead  to  the  same  specification  test.  For  the  linear  model 
case,  the  specification  test  involves  the  difference  between  the 
least  squares  estimates  of  B-,  including  and  excluding  Xp  from  the 
model;  i.e.,  the  test  Is  based  on  the  length  of 
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q  =  (Xj^Q2X^)"^Xj^Q2Y  -  (Xj^X^  )~^Xj^Y 

where  Q^  -  I.^,  -  X^  (Xp^)"-'-Xj^  =  I^  -  P^  for  1=1,2.  For  the 
simultaneous  equations  Interpretation,  we  compare  the  two 
stage  least  squares  (2SLS)  estimates  of  B,  In  equation  (2.2) 
using  Q^  as  instruments  with  those  using  W  =  [X, !Qp]  as 
Instruments.  Since  the  columns  of  XiQp  are  uncorrelated  with 
c  under  both  H  and  H^ ,  this  comparison  represents  a  test  of 
the  hypothesis  that  pllm  mX'e  =  0.  This  specification  test  is 
based  upon 

{Xp^X^)-\Q^Y   -    (X'P^X^)-^X^P^Y 

=  (Xj^Q2X^)"^Xj_Q2Y  -  (Xj^X^)~^Xp  =  q 

where  P^  represents  the  orthogonal  projection  operator  onto  the 

column  space  of  W  and  P,,X-  =  X^  . 

W  1     1 

Under  H  ,  E(q|X^,Xp)  =  0  so  that  we  reject  the  null  hypothesis 

a. 
if  the  length  of  q  differs  from  0  by  more  than  sampling  error. 

Hence  the  specification  test  statistic  Is 

m  =  q' [Var(q) J  q 

where  []   denotes  the  Moore-Penrose  generalized  Inverse  of  []. 
This  represents  a  modification  of  the  procedure  proposed  by 

All  results  in  this  paper  hold  for  any  consistently  defined 
generalized  inverse. 
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Hausman  (1978)  to  allow  for  the  possible  singularity  of  the 
matrix 

Var(q)  =  aH  (Xj_Q2X^)~^  -  (Xj_X^)"^] 

where  Var(e)  =  a^l„.    If  we  write 

q  =  [(Xj_Q2X^)"^Xj_Q2-(X£X^)"^Xj_]Y   =  DY, 

note  that  Var(q)  =  a^DD'. 

Lemma  2.1:  (Eaton  (1972)  Proposition  3.19,  p.  3.10).   Suppose 

Y  '^  N(y,Z)  where  u  is  an  element  of  the  range  space 
of  Z   which  is  possibly  singular.  If  A  is  symmetric 
and  Z   =   DD'y  then  Y'AY  is  distributed  as  non-central  x 
with  degrees  of  freedom  =  rank(D'AD)  and  non-centrality 
parameter  =  y'Ay  if  and  only  if  D'AD  is  idempotent . 

Proposition  2.1:  If  e  is  normally  distributed,  then  under  H  , 

'^^^  '^  xi   where  d  =  rank(D). 

Proof:  Under  H  ,  q  =  DY  =  De,  so  that 
o 

a^m   =  e'D'(DD')'^De. 
The  proof  follows  from  the  lemma  since  D*(DD*)  D  is  idempotent 
for  any  generalized  inverse  ( )  . 

The  assumption  of  normality  can  be  relaxed;  and  assuming  that 
=rX'X,  and  ifXpXp  converge  to  non-singular  matrices,  we  can  derive 
the  limiting  distribution  of  m. 
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Proposition  2.2:  Under  the 'null  hypothesis,  o^m  converges  In 

distribution  to  a  x^  random  variable. 
Proof:  Writing 

m  =  /Tq'[TVar(q)]"^/Tq, 

/v  d      « 
observe  that  /Tq  -^  N(0,o  DD'),  where  in  an  abuse  of  notation 

o^DD'  denotes  lim  TVar(q)  =  lim[  (^X'Q„X^  )~^  -  (^X'X,  )"■'■]  o^. 

TVoo  TU-oo        J-  -L  ^  -1-  I     1     X 

The  proof  follows  from  the  lemma  again,  since  D'(DD')  D  is 
idempotent . 

Using  the  lemma,  it  follows  immediately  that 
Corollary  2.1:  Under  H^ ,  a^m   is  distributed  as  non-central  x^ 

with  non-centrallty  parameter 

\   =  6^X^X^(Xp^)"^[(X-|_Q2X^)"^-(Xj_X^)~^]"^(Xp[^)"^X|X2B2 

either  asymptotically,  or  -  assuming  normality  -  in 

finite  samples. 
The  constant  o^    can  be  removed  from  the  above  propositions.  For 

Proposition  2.2,  any  consistent  estimator  can  be  substituted  for 

-                          2         1 
a  .  For  Proposition  2.1,  let  s   denote  times  the  sum  of 

T-k 

squared  least  squares  residuals  from  equation  (2.1). 

Corollary  2.2:  Assuming  e  is  normally  distributed, 

mi  =  5!rk   [q'(DD')Vs^]  '\'  F(d,T-k) 

under  H  . 
o 

Proof:  Let  Q  =  I^-X(X'X)~-'-X '  where  X  =  [X  :X2].  Then  s^  Is 
proportional  to  e'Qe  and  m  is  proportional  to  e'D'(DD')  De  under 
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H  .  Since  D'  lies  In  the  column  space  of  X,  QD'(DD')'^D  =  0; 

2 
3  and  m  are  thus  independent  by  Proposition  3.30  (page  3.15) 

of  Eaton  (1972). 

In  general,  we  will  assume  that  the  columns  of  X^  and 

Xp  are  not  orthogonal  -  in  particular,  that  X'X^  has  full 
rank,  so  that  rank  (X'X_)  =  rank  (X'X, )  =  min  (k.jkp).  Moreover, 
we  will  assume  we  have  sufficient  observations  to  estimate  all 
k  parameters;  thus  T-k.  >  k.  and  X!Q  X.  Is  non-singular  for 
i7^j=l,2.  Under  these  assumptions. 


Proposition  2.3:  The  degrees  of  freedom  for  the  m  test  are 

given  by  d  =  rai 
Proof:  From  the  definition  of  D, 


given  by  d  =  rank  (D)  =  mln(k  ,k  ) . 


-XjX^D  =  X^  -  (Xp^)(Xj^Q2X^)-4'Q2. 
Since  P2+Q2  =  I^, 

-X^X^D  =  X^P^  +  [I  -  (X'X^)(X'Q2X^)-1]X|Q2 

=  X^P^  -H  CX'Q^X^  -  XiX^](XiQ2X^)-\Q2 

=  X£P2  -  CXj_P2X^](Xj_Q2X^)-^Xj[Q2. 


Thus 


-D  =  (X'X^)-lx'P2Cl  -  X^(X^Q2X^)-4j_Q2] 


so  that   d  <  min  [rank  (Xj^X^)~^Xj_P2,  rank(I-X^(X^Q2X^)"^Xj_Q2 )  ]  . 
Since  rank(Xj_X2)  =  min(k^,k2)  and  I-X^(Xj^Q2X^)"^Xj^Q2  is 
idempotent  of  rank  T-k^,  d  <  min(k^,k2).  "^^   remove  the  inequality. 
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observe  that  DP^  «=  (X'X.)"  X'Pp  whose  rank  equals  inin(k^,k  );  thus 
rank  (D)  >_   mlnCkj^.k^),  which.  In  conjunction  with  d  _<  mln(k  ,k  )^ 
completes  the  proof. 

2.2.  Comparinr,  the  m  and  F  tests 

For  the  null  hypothesis  H  :  0   =  0,  the  F  test  is  based  on 
the  length  of  the  least  squares  estimate  of  6-  In  equation  (2.1): 

(2.3)    F  =  t^lY&riQ^)r-^B^   =  Y'Q^^X^  (X^Q^X2)"-'-X^Q^Y/a^ 

From  now  on,  for  convenience,  we  will  assume  that  a^  Is  known 
to  be  1,  so  that  all  relevent  tests  are  x^  tests.  Since  s  Is 
Independent  of  both  m  and  F,  this  simplification  will  not  affect 

our  results.  Note  that  under  H  ,  F  Is  distributed  as  v,^   and  that 

o  '^k.2 

under  H, ,  it  is  distributed  as  non-central  x^  with  non-centrality 
parameter  A^  =  Q^X^Q^X^B^. 

In  the  context  of  specification  tests,  a  related  null 
hypothesis  of  some  interest  is 

H»:  (X'Xt)"-^X'X_6^  =  0 
O     11     12^ 

with  corresponding  alternative  H* :  (X^X^)"-'-Xj^X2  62  ?^  0.  Note  that 

H*   represents  the  hypothesis  that  the  bias  in  the  least  squares 

estimate  of  B-,  when  6„  is  omitted  is  zero.  In  some  circumstances, 

the  potential  distinction  between  H  and  H*  will  be  quite 

important.  Heuristlcally ,  H  is  a  set  of  restrictions  on  all 

estimable  functions  of  6^,  whereas  H*  restricts  only  a  subset  of 

2*  o 

them.  Formally, 
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Proposition  2.^:  The  restrictions  represented  by  H  and 

H»  are  identical  if  and  only  if  k,  >  k.. 
o  1—2 

Proof:  H^:  i^   =    0   always  Implies  H»:  (X'X  )"-'-X'X282  =  0,  so  we 
need  to  verify  only  the  reverse  Implication.  If  k^  ^  ^p»  ^hen 
^2^1^2  ^^  non-singular.  Premultiplying  H*   by  (X'P  X  )"-'-X'X 
yields 

(X^P^X2)"-'-X^X^*(X|X^)"-'-Xj_X262  =  0   =^  Sj  =  0  • 

If  k,<  kp,  the  null  space  of  X|X„  is  non-empty,  so  that 
X'XpSp  =  0  does  not  imply  that  3^  =  0. 

The  relationship  between  H  and  H*  is  reflected  in  the 
corresponding  F  tests  of  H  and  H*.  Let  F*  denote  the  length  of 
the  least  squares  estimate  of  (X'X  )~  X'X-BpS  i.e., 

F«  =  B^X^X^(Xj_X^)"^{VarC(Xp^)"^Xj_X2e2]}"^(Xj_X^)"^Xj_X202 

(2.ii)      =  Y'Q^X2(X^Q^X2)"-'-X^X^[Xj_X2(X^Q^X2)"-'-X^X^]'^ 

X  Xj^X2(X^Q^X2)"-'-X^Q^Y  . 

Under  H*,  F*  is  distributed  as  x^  with  degrees  of  freedom  equal 


parameter  can  be  shown  to  be 


to  rank  CX|X2(X^Q^X2)~  X^X^];  under  H»,lts  non-centrallty 


Xp^  =  8^X^X^[Xj^X2(X^Q^X2)"-^X^X^]'^Xj_X262  . 
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To  compare  F  and  F*,  two  little-known  (to  us)  matrix 
Identities  will  be  useful.  Let  A,B,C,  and  D  be  conformable 
matrices  with  D  non-singular  (and  square).  Assume 
C'A  B  +  D   is  non-singular,  and  denote  the  column  space 
of  a  matrix  by  M( . ) . 
Lemma  2.2:  (Rao  and  Mitra  (1971)  pp.  70-71).   If  M(B) C  M(A)  ana   / 

M(C)CM(A),  then 

[A+BDC]"^  =  a"*"  -  A'^BEC'A'^B  +  D~-'-]"^C'A"*'  . 
A  version  of  this  result  for  non-singular  A  and   BDC '  appears 
in  Smith  (1973) • 
Lemma  2.3:  (Rao  and  Mitra  (1971)  p.  22).   If  rank  (ABC)  =  rank  (B), 

then  ClABC]"'"a  =  B"*"  . 
Botn  of  these  lemmas  hold  with  minor  modifications  for  any 
generalized  inverse.  From  Lemma  2.2,  we  can  easily  derive  a 
result  which  is  very  useful  in  linear  model  manipulations. 

Lemma  2.4:   In  the  notation  of  equation  (2.1): 

(Xp^X^)"^  =  (X^X^)"^  *    (Xj^X^)"^XjXj[XjQ^Xj]~^XjX^(X^X^)"^ 

where  i?^j=l,2. 

Proof:  Let  A  =   XIX.,    B=C'=X:X.,  and  D  =  -(X'.X.)"^.  Since  A  is 
1  1         1  J  J  J 

non-singular,  M(B)  =  M(C')C  M(A);  applying  Lenma   2.2  with 

some  algebra  concludes  the  proof.  Note  that  if  k  >  k.,  BDC  musr 

be  singular  so  that  Smith's  result  is  inapplicable. 
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Given  the  relationship  between  H  and  H*  in  Proposition 
2.4,  the  following  relationship  between  their  associated 
F  tests  is  obvious: 

Proposition  2.5:   If  k  >  kp,  P  =  F*. 
Proof:   From  equation  (2.4), 

=  Y'Q^X2(X^Q^X2)"^[(X^Q^X2)"^]"^(X^Q^X2)"^X^Q^Y 

by  Lemma  2.3,  since  rank(Xj^X2(X^Q^X2  )"-'-X^X^)  =  rank[  (X^q'^X2)"-^] 
if  and  only  if  k  >_  k  .  Thus  if  k  >_  k  , 

F»  =  Y'Q^X2(X^Q^X2)~-^X^Q^Y   =  F 

from  equation  (2.3). 

These  tests  are  then  related  to  the  m  test  for 
specification  error  by  the  following  argument.  F*  and  m  can 
be  thought  of  as  the  length  of  two  different  estimates  of  the 
bias  in  the  least  squares  estimate  of  g.  from  omitting  Q^: 
the  actual  bias  is 

B  =  (X£X^)-1X'X262 
and  the  different  estimates  are 

(2.5)      §p,  =  (X^X^)"^Xj^X262  =  (Xj_X^)"^Xj^X2(X^Q^X2)"-'-X^Q^Y 
and 

^m   "  IU[Q2\'^~'^H'^2    ■  (X£X^)'^Xj_]Y  . 
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Note  that  B  can  be  regarded  as  an  estimate  of  the  bias  since 
m 

Its  expectation  Is  -(X|X  )~-^X'X  Bp- 
Proposition  2.6:  For  any  k  ,k„,  P*  =  m. 

Proof:  F«  =  6^,eCVar  (Bp,^)  ]'*'Bp,   and  m  =  B^[Var(B^)  D'^Bj^,  so 

that  F*  =  m  If  B_^  =  -  B  ,  which  we  verify  below.  Using 

equation  (2.^)  and  Lemma  2.^,  we  can  write 

6p,f  =  (Xp^)"^Xp2Q-LY  +  (Xj_X^)~^Xj_P2X^(Xj^Q2X^)"^Xj_P2Q-[^Y. 

Replacing  P_  by  I^-Qp  yields 

%*    =  -(Xj_Q2X^)~^Xj_Q2Q^Y  =  (Xj_Q2X^)~^X^Q2P^Y  -  (Xj_Q2X^  )"^Xp2Y 

=  (Xp^)"^Xj_Y  -  (Xj_Q2X^)"^Xj^Q2Y 

=  -  B   . 

m 

In  summary,  the  specification  test  statistic  m  is  equal 
(for  all  k-,  and  k^)  to  the  F  test  statistic  F*  for  the  hypothesis 
that  certain  linear  functions  of  0p  equal  zero.  If  k-;  L  ^p ' 
both  m  and  F*  are  equal  to  the  F  test  statistic  F  for  the 
hypothesis  that  all  linear  functions  of  6-  are  zero.  Since  the 
latter  F  test  is  in  common  use,  it  is  interesting  to  compare 
it  with  the  m  (and  F*)  test  when  k,  <  k   and  they  are  different. 

2.3   F  and  m  compared  when  k,  <   k^. 


At  the  outset,  we  must  specify  which  null  hypothesis  is 

under  consideration.  Under  H  ,  F  and  m  are  distributed  as 

o' 

central  x^  random  variables  with  k-  and  k^  degrees  of  freedo: 
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respectively  -  recalling  that  k,  <  kp  in  this  section.  Under 

H  *  however,  m  Is  distributed  as  central  y*  with  k.  degrees 
o  -*■ 

of  freedom  but  F  has  a  non-central  xh   ^distribution  with  non- 

cnetrality  parameter  Xp  =  QlXl^Q^X^Q^   >_  0.  Thus  as  a  test 

statistic  for  H*,  P  does  not  have  the  usual  xt     distribution. 

If  we  mistakenly  use  that  distribution  -  i.e.,  if  we  test  H 

when  we  should  test  H*  -  we  obtain  a  test  whose  nominal  size 

o 

is  smaller  than  its  true  size. 

Considered,  then,  as  a  test  of  H  ,  m  has  strictly  fewer 
degrees  of  freedom  than  F.  However,  under  the  alternative 
hypothesis. 

Proposition  2.7:  The  non-centrality  parameter  of  the  m  test 

is  less  than  or  equal  to  that  of  the  F  test; 
i.e.,  when  k^<  k^,  X^  <  X^,,  for  all  Bp. 

Proof:   From  Corollary  2.1, 

\   =  S^X'X^(Xj_X^)-^C(Xj_Q2Xi)"^  -  (X£X^)'^]^(x-x^)-^xj_X2e2 

=  e^X^[X^(Xj_P2X^)"^Xj  -  ^-^^2^2 
using  Lemma  2.4.  Thus 

^m  =  B^X'[P2X^(XiP2X^)-lxiP2  -  F^^X^Q^    ^ 
since  P^X  (X|P2X  )"-'-X'P2  Is  idempotent, 

8'X'(P2X^(X'P2X3_)-lxiP2)X2B2  1  3^X^X282 

for  all  62'  Thus  X^^  <B^X2(I-P^)X282  =  ^2^2V2^2  '^   ^F' 
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Thus  m  has  fewer  degrees  of  freedom  and  smaller  non-centrality 
than  F  for  the  hypothesis  H  :  62  =  0.  To  compare  m  and  F, 
recall  that  the  power  of  a  x^  test  of  fixed  size  (a)  increases 

with  the  non-centrality  parameter  for  fixed  degrees  of  freedom, 

2 
and  (b)  decreases  with  the  degrees  of  freedom  for  fixed  X.      The 

m  test  will  thus  be  relatively  more  powerful  when  k,  is  much 

smaller  than  k^  and  X   is  close  to  A„.  In  general,  the  relative 
2      m  F     ^      ' 

power  of  the  tests  depends  upon  the  trade-off  between  degrees 
of  freedom  and  non-centrality;  this  can  be  calculated  numerically 
from  the  tables  of  the  non-central  x^  distribution  but  nothing 
much  can  be  said  analytically. 

Recognizing  that  the  m  test  does  not  treat  all  estimable 
functions  of  3-  symmetrically,  one  is  led  to  calculate  the 
direction  in  which  the  m  test  has  greatest  power.  Without  loss 
of  generality,  we  restrict  our  attention  to  6p  of  unit  length: 

Proposition  2.8:  X   is  maximized  over  6„  whenever  6-  lies  in 
m  2  ^ 

the  column  space  of  (X^X„)~  X^X^;  i.e.,  for 
$2  of  the  form  &^   =  iX^X^)~-^X^X^£,    for  any  k-j^ 
vector  i. 


This  can  be  inferred  from  the  Pearson  and  Hartley  (1951) 
charts  of  the  power  of  the  F  test,  which  are  reproduced  in 
Scheffe  (1959),  pp.  >^3S-^^5.    A  particularly  convenient  set  of 
tables  of  the  distribution  of  a  non-central  x^  variate  is 
Haynan,  GovindaraJ ulu,  and  Leone  (1962),  part  of  which  is 
reproduced  in  Harter  and  Owen  (1970). 
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Proof:  Making  the  substitution  B^   =  (X^X2)"'''X^X^C  in 
Corollary  2.1,  we  obtain 

^m  '    C'Xj_P2[I-Pi]P2XiC  =  S^X^Q3_X2S2  =  A^ 

and  the  result  follows  from  Proposition  2.7.  It  may  be  of  some 
interest  to  note  that  this  direction  of  maximum  power  is 
precisely  the  direction  in  which  the  least  squares  estimate 
of  S>2   Is  biased  when  X,  is  omitted  from  equation  (2.1). 

For  the  null  hypothesis  H  against  the  alternative  H 

there  are  thus  estimable  functions  of  gp  against  which  the  m 

test  has  strictly  greater  power  than  the  F  test.  On  the  other 

hand,  there  are  functions  of  3p  against  which  the  F  test  is 

more  powerful."^  In  light  of  the  optimum  properties  of  the  P 

test,  this  ambiguity  should  not  appear  surprising.  The  P 

test  is  uniformly  most  powerful  among  invariant  tests  of 

H  :  3n  =  0:  the  m  test  is  not  invariant  for  H  since  it  depends 
o        d        ~  o 

upon  the  covariance  between  X,  and  X- . 

There  are  some  alternatives  against  which  the  m  test  of  H 
performs  particularly  poorly.  For  k,  <  k-,  the  null  space  of 
X'X-  is  non-empty;  for  Q     lying  in  that  space,  the  power 
function  of  the  m  test  is  flat  with  power  equal  to  the  size  of 
the  test.  As  this  persists  in  large  samples,  we  conclude  that 

^This  possibility  was  raised  in  Holly  (1980  a). 
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Proposition  2.9:  For  k,  <  kp,  the  m  test  is  Inconsistent  for 

H  against  H^ . 

Of  course,  f:'om  the  viewpoint  of  testing  for  mls-speclflcatlon 

in  the  estimation  of  6,,  this  inconslstancy  is  irrelevant,  since 

alternatives  in  the  null  space  of  X'X-  do  not  contribute  to 

the  bias  of  the  least  squares  estimate  of  6-.  .  If,  however, 

H  :  6p  =  0  is  of  Interest  in  its  own  right,  then  Proposition 

2.9  is  a  serious  Indictment  of  the  m  test  for  H  . 

o 

For  the  linear  model,  this  comparison  of  the  P  and  m  tests 

clearly  depends  upon  which  null  hypothesis  is  being  considered. 

From  the  viewpoint  of  specification  tests,  the  relevant  null 

hypothesis  is  H* :  (X'X^)~-'-X'X  6   =  0;  the  m  test  is  precisely 

equivalent  to  the  F  test  for  this  hypothesis  -  and  thus  is 

equivalent  to  the  likelihood  ratio,  Wald,  and  Lagrange  multiplier 

tests  for  H* .  H*,  in  turn,  is  equivalent  to  H   for  k,  >  k  , 
o    o  '  o      1—2 

so  in  this  case,  the  m  and  F  tests  for  H  are  equivalent.  For 

o 

k,  <  k„,  the  m  and  F  tests  for  H  differ.  The  m  test  has  smaller 
12'  o 

degrees  of  freedom  but  also  a  smaller  noncentrallty  parameter  than 
the  F  test  for  H  ;  thus  neither  test  is  more  powerful  than  the 
other  for  all  alternatives  H,  . 

If  interest  in  the  hypothesis  H  :  Q^   =    0   centers  around 
possible  simplification  of  the  estimation  problem  for  B-,  >    the 
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relevant  null  hypothesis  Is  the  mis-specification  hypothesis 

H* .  The  m  test  is  uniformly  most  powerful  invariant  for  this 
o 

hypothesis,  whereas  the  F  statistic  for  H  defines  a  test  for 
H*  of  the  wrong  size,  despite  being  UMP  invariant  for  H  .  For 
the  mis-specificatlon  hypothesis,  the  m  test  is  clearly 
preferable. 
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3.  The  genor.il  non-linear  model 

The  characterization  of  the  m  test  for  linear  models 
in  the  previous  section  extends  easily  to  a  non-linear 
framework.  Using  a  model  and  some  results  from  Holly  (1980^), 
we  establish  that  the  m  test  is  asymptotically  equivalent  to 
the  likelihood  ratio,  Wald  and  Lagrange  multiplier  tests 
of  the  hypothesis  analogous  to  H*  in  the  previous  section;  i.e., 
that  the  asymptotic  bias  in  maximum  likelihood  estimates  of  a 
subset  of  parameters  is  zero  when  the  remaining  parameters 
are  constrained. 

Following  Holly(1980a) ,  consider  a  family  of  models  having 
log-likelihood  L(e,Y)  for  T  observations,  where  (6,y)  are  (p,q) 
vectors  of  unxnown  parameters  respectively.  A  null  hypothesis  of 
interest  is 

H  :  e  =  e' 

o 

against  a  sequence  of  local  alternatives 

H^:  e  =  e^  =  e°  +  6//T. 

Deviating  from  Holly,  we  assume  the  framework  of  a  specification 

test:  that  we  are  primarily  interested  in  estimating  y   and 

are  concerned  about  H   only  insofar  as  it  affects  that  estimation  . 

o 

Accordingly,  there  are  two  ways  of  estimating  y :    imposing 
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and  not  Imposing  H  .  For  large  T,  these  estimators  are  the 

solution  of 


|^(S)     '   0 

respectively,  where  6'  =  (0'*.y')'  and  the  true  parameter  vector 

is  6"'  =  (e°':Y°')'. 

Under  suitable  regularity  conditions.  Holly  (1980a)  shows 
that 

(3.1)  /t(9-y")  '  i\,-^,,^',\\,yh-i^,^i\^ji^^')  -tIIt^^")^' 

(3.2)  /T(y°-Y°)  ^  -I'-'-I  fi6  +  I"^  -^  1^(6°), 

YY  y6      yY  *'T  3y 

and  using  his  equations  (3)  and  (4),  one  can  show  that 


Note  that 


1   3*L 


-  pli:n  T  36^6^(6°)  = 


Y9   yY 


and  that  sufficient  regularity  is  assumed  so  that 
1   a^L 


T  3635 


T-(<S) 
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converges  almost  surely  to  the  Information  matrix  as  6  -»■  6°; 
see  Holly  (1980a)  for  details. 

From  equations  (3.1)  and  (3.2),  one  can  show  that 

/TCy-y")  -^  N(0,[I  -I  .i~ii.  ]~-^) 
\  I  1  '      X  » I.  YY   y6  66  6y 

and 

/T(y°-y":  -  N(-I^^I^q6.I-^). 

By  the  argument  in  Hausman  (1978)  , 

(3.^)      /T(y-Y°)-^  N(  I"^I  ^6,  [I   -I  .iZllc    l""""  -  I""^) 
^'  '  YY  Y0      YY   y6  96  6Y        YY 

which  confirir.s  Holly's  algebraic  derivation  of  equation  (6) 
Under  some  circumstances,  the  limiting  covarlance  matrix  in 
equation  (3.^)  may  be  singular.  Accordingly,  we  define  the 
m  test  statistic  as 


m  =  /T(9-9°)'[Var(Y-Y'')]"'  /T(9-Y°) 


since  under  H  :  6  =  6°,  /T(y-y'')  converges  to  a  random  variable 

having  zero  mean  and  under  H-,  the  mean  of  the  limiting  distribution 

is   I~^I  .6. 
YY  y6 

In  general,  we  assume  a  minimal  structure  for  the  information 

matrix.  In  particular,  in  contrast  to  Holly's  (1980a)  equation 

(10),  we  assume  that  rank(I  „)  =  rank(I^  )  =  min(p,q),  so  that 

Y  t)  D  Y 

all  parameters  provide  Information  useful  for  estimating  any 
other  parameter.   Since,  from  Lemma  2.2, 
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n   -I   l""^!   I""'"  =  t""^  +  I~'^T   FT   -I   I~'^I   1~"^I   i"-^ 
YY   y6  99  QY-"       YY     YY  Y9'-  99  "^Oy  YY  Y9-'    Sy  YY  * 

rank[Var(Y-Y  )]  =  rank(I  .)  =  mln(p,q)  under  our  assumptions. 

Thus 

Proposition  3.1:  Under  H  ,  m  converges  In  distribution  to  a 

X^  random  variable  with  mln(p,q)  degrees 

of  freedom. 

An  alternative  hypothesis  of  some  interest  in  the 
context  of  specification  tests  Is  that 

H»:  l"-"-!  -6  =  I"^I  -9°   <=>  I"^I  .6  =  0 
O    YY  Y9      YY  Y9  YY  Y9 

I.e.,  that  the  asymptotic  bias  In  y'  -  the  estimator  which 

uses  the  Information  that  9  =  9°  -  Is  zero.  Note  from 

equation  (3. 4)  that  the  limiting  distribution  of  m  under  K* 

is  the  same  as  that  under  H  . 

o 

The  Wald  test  of  H*  is  based  on  the  length 

o 

of  the   vector  of  unconstrained  estimates 

/TI'^I    „(e-9°) 
YY    y9 

which,  from  equation  (3-3)j  converges  in  probability  to 

T~  T   Ft    T   t""^!   i"  T   T~    =•    (  s.'^  \ 
YY  Y9'-  99"-^9y  YY  Y9     9y  YY  /T  3y^  ^' 
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Proposition  3-2:  For  any  (q,p),  /T  I  -^I  ^(e-e")  converges 
YY  Yb 

in  probability  to   /T(y-Y°). 
Proof:   Combining  equations  (3.1)  and  (3.2)*, 

(3.6)  /tCy->»)  J  i;li^,6  -  [I^^-I^,i;^l,^r\ele6  !?(«") 

Using  the  identity 

YY   y6  66  6y         YY    YY  Y9   66  Qy    yY  Y6     9y  YY 

the  —  terms  in  equations  (3-5)  and  (3.6)  are  equal.  For  the 

9L 

jr-  terms,  begin  with  the  identity 

'■^ee~-'-eY"'"YY^Y9-"-'''e6"^eY''"YY^Ye-'   °  -"-q 

and  premultlply  both  sides  by  I~  I  „i7q  to  obtain  upon  rearranger.ent 
•r  r-    J  J       yy   YQ  9 9  '^  = 


(3.7) 


YY  y6   66   6y  YY  Y6        YY  Y6  99 


YY  y9  99^eY  YY  Y9'-  96   9y  YY  yS^ 


^^""^  3.1:  i;^9Y\^Y9^^ee-^eYSYSe^"'  =  ^^6e-^6YS^Ye^'' 

"  ■'•9y"''YY^Y9^9  9  • 
Proof:  The  following  identity  can  be  checked  by  expansion: 
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'^^9e"^9Y^YY"^Y9-''^ee-^0Y"^YY'^Ye  "  ^eY^'^YY^Ye^ee'^^ee'-^eY'^YY'^Ye-'* 

and  the  lemma  follows  by  pre  and  post  multiplying  by 

•^^ee'-^SY-'-YY'^Ye-'  * 

Substituting  the  lemma  into  the  second  term  in  equation  (3.7),. 
we  obtain 

^YY^Ye'-^QQ'^QY'^YY^Ye-'   "  ^YY''"Ye-^ee 

YY  yq"-  ee  ^eY  YY  Ye-"  -^9y  yy  Ye^ee 

=  ^^YY'^'''YY^Y9^^ee~''"eY'^YY'^Y9-'   ■'•eY''"Yr  ■^Y9"^99 

'   ^^Y'Se^ge^BY^'^Se^ee.' 

which  establishes  the  proposition. 

Since   /Tl'^^I  ^(9-9°)  has  the  same  limiting  distribution 
YY  y9 

as  /T(y-y°),  the  m  test  statistic  and  the  Wald  test  statistic 
for  H*  have  the  same  limiting  distribution.  Thus,  asymptotically, 
the  m  test  is  equivalent  to  a  Wald,  likelihood  ratio  (LR),  and 
Lagrange  multiplier  test  of  H*  -  the  hypothesis  that  imposing 
H  :9=9°  leaves  the  maximum  likelihood  estimator  for  y 

0 

asymptotically  unbiased. 

The  relationship  between  the  m  test  and  the  Wald  test  of 

H   is  perfectly  analogous  to  that  discussed  for  the  linear  case 

in  sections  2.2  and  2.3-  Briefly,  assuming  rank(I  ^)  =  mln(p,q), 

Yo 
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I'^i^gCe-e")  =  0  iff  (e-e")  =  o 

for  q  >_  p,  so  that 

Proposition  3' 3-   If  q  L  P  ^"^  rank(I   )  =  p,  the  m  test 

statistic  has  the  same  limiting  distribution 

as  the  LR  (or  Wald)  test  statistic  for  H  . 

Moreover,  for  q  <  p,  the  m  test  has  fewer  degrees  of  freedom  and 

a  smaller  noncentrallty  parameter  than  the  LR  test  of  H  .  Neither 

test  for  H  dominates  the  other  for  all  estimable  functions  of 
o 

(6-6°);  there  exist  directions  in  which  the  power  of  the  m 

test  equals  Its  size  and  there  exist  directions  in  which  the  n 

test  has  strictly  greater  power  than  the  LR  test. 

If  the  mls-specificatlon  hypothesis  H*  is  the  correct 

hypothesis,  the  m  test  is  the  correct  test.  The  LR  test 

statistic  for  H  has  the  wrong  size  for  H*  (when  q  <  p),  and 
o  o 

the  m  test  -  being  asymptotically  equivalent  to  the  LR  test  cf 

H*  -  possesses  the  usual  local  power  properties.  Echoing  the 
o 

conclusion  of  Section  2,  when  interest  in  H  :e=6°  derives  fro.T  the 

o 

desire  to  impose  this  restriction  when  estimating  y>  the  relevant 

null  hypothesis  is  H*:I~  I  „(e-e'')  =  0  and  the  relevant  test  is 

o   YY  Y^ 

the  m  test. 


For  (8-6°)  in  the  null  space  of  l""'"!  „  and  (6-6")  in  the 

YY  y9 

column  space  of  I~qI„   respectively. 

66  9y 
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4 .  Testing  the  legitimacy  of  Instruments 

In  this  section,  we  derive  specification  tests  of 
overldentlfylng  assumptions  for  a  single  structural  equation. 
Specifically,  we  develop  a  test  of  the  hypothesis  that  certain 
variables  are  uncorrelated  with  the  structural  disturbance  term. 
This  hypothesis,  as  we  shall  show,  includes  both  overldentlfylng 
exclusion  restrictions  of  the  Cowles  Commission  type  and 
restrictions  on  the  structural  disturbance  covarlance  matrix. 
Let 

(4.1)     y^  =  y^8^  +  Z^Yi  +  e^  =  X^6^  +  e^ 

be  the  first  structural  equation  in  a  system  of  simultaneous 
equations  denoted 

YB  +  Zr  =  e,      cov(e)  =  Z. 

As  usual,  assume  there  are  g,  endogenous  variables  Y,  and  k, 
predetermined  variables  Z   present  in  the  first  equation  and 

that  we  can  use  no  coefficient  restrictions  from  equations 

other  than  the  first.  To  identify  and  estimate  the  parameters 

of  this  equation,  we  need  g-i+k,  instruments;  and  we  may  use  (i) 

the  Ic   predetermined  variables  Z,  ,  (11)  any  other  predetermined 

variables  which  are  correlated  with  Y  ,  and  (ill)  any  endogenous 

variables  y-,,...,y^  which  are  correlated  with  Y,  but  asymptotically 
d  It  1 

uncorrelated  with  e, .  The  prior  Information  that  certain  variables 
are  uncorrelated  with  the  disturbance  in  the  first  equation  is 
thus  very  important,  and  it  is  equally  important  to  be  able  to 
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test  such  information. 

Let  W  denote  a  Txw  matrix  of  observations  on  w  instruments 
which  we  maintain  to  be  uncorrelated  with  e,  under  all  circum- 
stances. Moreover,  we  must  assume  that  w  >_  g-, +k   so  that 
equation  (^.1)  is  at  least  Just-identified.  We  need  not, 
however,  include  all  of  the  Z,  in  W;  if  the  exogeneity  of  a 
particular  Z,  is  in  doubt,  it  may  be  tested,  provided  the 
equation  is  at  least  just-identified  under  both  the  null  and 
alternative  hypotheses. 

Let  W   denote  a  Txw  matrix  of  observations  on  w  >  w 

instruments,  which  Include  all  the  Instruments  in  W.  Specifically, 

we  assume  that  the  column  space  of  W  is  a  proper  subspace  of  the 

column  space  of  W^ ;  the  difference  in  dimensions  will  be  denoted 

W--W  =  w*  >  0,  and  a  set  of  vectors  spanning  the  column  space  of 

W   will  be  denoted  [WiW*].  The  orthocomplement  of  the  column  space 
of  W  In  the  column  space  of  W,  is  a  w*  dimensional  subspace 

spanned  by  the  columns  of  w"*"  =  Q,,W»,  where  Q,,  =  I  -  W(W'W)~-^W' 

w         w 
=  ^  -  ^w 

A  null  hypothesis  of  some  Interest  is  that  the  w*  "extra" 

instruments  W*  are  asymptotically  uncorrelated  with  the  structui'al 

disturbance : 

H  :  plim  i  W*'£   =  0  , 
^   T-*-  °° 

against  the  alternative  H  :  pllm  ^  W* '  c  /    0.  This  is  irr.pcrlant 

for  tv;o  reasons.  If  we  treat  the  columns  of  W*  as  uncorrel  aled 

with  c   and  they  are  not,  the  resulting  Instrumental  variables 

estimator  for  6.,  vvflll  be  Inconslstant .  If  we  treat  W*  as  correlated 
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with  e^  and  it  is  not,  the  corresponding  Instrumental  variables 
estimator  will  be  inefficient  in  the  following  (presumably 
well-known)  sense.  Let  6,  and  6  J  denote  the  two  stage  least 
squares  (2SLS)  estimates  of  5,  using  W  and  W,  as  Instruments 
respectively. 

Proposition  4.1;  If  pllra  ^  We,  =  0,  the  difference  between 

the  covariance  matrices  of  the  limiting 

distributions  of  6,  and  5|  Is  a  non-negative 
definite  matrix. 

Proof:  The  proof  follows  directly  from  the  observation  that 

X'P,,  X,-X'P,,X^  =  X'P,,tX,  is  non-negative  definite,  where  P. 
1  W,  1   1  w  1     1  V.' '  1  °  A 

denotes  the  orthogonal  projection  operator  onto  the  column 
space  of  A. 

The  hypothesis  H  is  somewhat  unusual  among  tests  of 
assumptions  used  in  simultaneous  equations  estimation.  For 
exogenous  variables  among  the  columns  of  W*,  H  tests  for 
exogeneity.  For  endogenous  columns  of  W*  (e.g.,  y . ) ,  H 
imposes  the  restriction  that 

(B"^Z)^^  =  0 

which  is  a  complicated  combination  of  disturbance  covariance  and 

coefficient  restrictions.  In  general,  for  y,  and  e      to  be 
uncorrelated,  cov(e  ,e  )  must  equal  0,  equations  (1,1)  must 

.  be  relatively  triangular,  and  cov(e,,e,)  must  equal  0  for  all 

j  such  that  equations  (j,l)  are  not  relatively  triangular.  See 
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Hausman-Taylor  (1980b)  for  definitions  and  details.  If 
W«  =  Y^  and  W  =  Z^,  we  have  the  situation  treated  \ 

by  Wu  (197J);  for  B  triangular,  we  have  the  limited  Information 

version  of  Holly's  (1980b)  test  for  the  diagonallty  of  I. 

In  the  spirit  of  specification  tests,  we  assume  we  are 
not  Interested  in  H  directly.  Rather,  we  wish  to  know  the 
consequences  for  estimating  6, of  imposing  H  ;  this  is  given 
by  the  length  of  the  vector 

=  Cy^. 

Note  that  under  H  ,  q  converges  to  a  normal  random  variable  with 

mean  0  and  variance  a^  [  (X' F,,X,  )~^  -  (X'P,,  X^)"^];  under  H,  ,  the 

elWl        IW,  1  1 

mean  vector  is  no  longer  0.  Thus  clgnif leant  deviations  of  q 

from  the  zero  vector  cast  doubt  upon  H  . 

o 

Using  results  in  Section  2,  it  is  easy  to  verify  that 

Proposition  ^.2:  If  rank(C)  =  c,  then  under  H 

m  =  ir  Q'[CC']"^q  =  ^   q '  [  (X^P^X^)"^  -  (Xj_P^^,  X^)-l]''q 
e  e  1 

is  asymptotically  distributed  as  x^  with  c 

degrees  of  freedom,  where  6^    is  any  consistent 

estimate  of  a^ . 
e 

Assuming  6.,  1l  identified  in  equation  (^t.l). 
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Proposition  ^.3:  Rank  (C)  =  c  =  min  [rank(X'Q  ) ,w*] 

Proof:  Repeating  steps  of  the  proof  of  Proposition  2.3,  we 

obtain 

Since  T-k^-g^  >  It^."*"^!*  ^^"^  ^^^  "^  ^^"^  ^^l^Wt^'  where 

The  result  then  follows  by  noting  that  X'P^W*  =  X'Q  . 

Since  some  columns  of  X,  may  be  columns  of  W  in  some 
applications,  rank  (X'Q^)  may  be  strictly  less  than  k,+g,. 


H*:  plim  (X|P^  ^i^'^iV^l  "  ° 


This  will  happen  whenever  we  accept  the  exogeneity  of  an         I 
explanatory  variable  in  equation  (4.1)  rather  than  subject  it 
to  test. 

As  in  Sections  2  and  3,    consider  the  null  hypothesis 


\ 


<=>  plim  (Xj^P^  X^)"^Xj_Q^W»(W*'Q^W»)"^W*'e^  =  0, 


which  states  that  the  asymptotic  bias  in  the  2SLS  estimator  for 
6,  is  zero  when  the  columns  of  W*  are  used  as  instruments  in 
addition  to  those  of  W.  As  before,  the  null  hypothesis  H 
restricts  all  linear  functions  of  the  w*  vector  plim  ="  W*'e- 
to  be  zero,  whereas  H*  restricts  only  a  subset  of  those  functions, 
If  rank  (XjQ,  )  >_  w*,  the  null  hypotheses  are  identical;  if 
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rank  (X'Q  )  <  w*,  they  differ.  Thus  particular  tests  (e.g., 
Wald  or  LR  tests)  of  H  and  H«  will  be  Identical  If  rank  (X'Q,J 
>_  w*  and  will  differ  otherwise,  as  we  observed  for  the  linear 
model  in  Proposition  2.5. 

Moreover,  in  the  limited  information  framework,  we  can 
relate  the  m  test  in  Proposition  ^.2  to  the  familiar  trinity 
of  asymptotically  equivalent  tests  of  H*.  We  argued  elsewhere 
(Hausman-Taylor  (1980a),  section  4.2)  that  the  2SLS  estimator 
for  6-,  in  equation  4.1  is  asymptotically  equivalent  to  the 
full  information  maximum  likelihood  estimator  (FIML)  for  6-. 
in  the  system 


(4.3) 


y-,  =  Y^6,  -H  Z^y^  +  e^ 


Y^  =  zn  +  V 


where  the  correlations  between  the  columns  of  V  and  e^  are 

unrestricted  and  all  instruments  are  columns  of  Z.  If  we 

do  not  impose  the  restrictions  H* ,  the  FIML  estimate  of  6, 

o*  1 

is  6-,  and  the  Wald  test  of  H*  is  based  on  the  length  of  the 
1  o 

vector 

=  -  q 

from  equation  (4.2).  Thus,  provided  o^  is  based  on  the  unrestricted 
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2SLS  residuals,  we  have 

Proposition  ^.3-  The  m  test  statistic  Is  identical  to  the 

Wald  test  statistic  for  the  null  hypothesis 

H*. 
o 

Asymptotically,  then,  the  m  test  for  the  legitimacy  of 
instruments  is  equivalent  to  a  Lagrange  multiplier  or  LR 
test  of  the  mis-specification  hypothesis  H* .  Since  W  and  W* 
can  be  chosen  arbitrarily,  there  are  a  number  of  interesting 
special  cases  of  this  test,  involving  both  coefficient  restrictions 
of  the  Cowles  Commission  type  and  disturbance  variance  and 
covariance  restrictions  of  the  type  discussed  by  Fisher  (1966), 
chapters  3  and  ^.  We  discuss  these  applications  elsewhere 
(Hausman-  Taylor  (1980b));  the  point  developed  here  is  the 
same  as  that  of  Sections  2  and  3  -  that  the  m  test  is  asymptotically 
equivalent  to  the  usual  tests  of  the  mis-specification  hypothesis 
that  imposing  H  causes  no  asymptotic  bias  in  the  maximum 
likelihood  estimator  of  the  parameters  of  interest. 
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